2,141 research outputs found

    Mise en pratique de LSPI pour la commande linéaire quadratique adaptative d'une surface de manipulation à coussin d'air actif.

    No full text
    National audienceCet article présente l'application de l'algorithme LSPI de Lagoudakis & Parr (2003) à la commande d'un système linéaire avec coût quadratique selon le protocole initialement proposé par Bradtke (1993). Le dispositif contrôlé est une surface active capable de mouvoir un objet sur un coussin d'air et dont la dynamique varie fortement en fonction de l'objet utilisé. La méthode d'apprentissage est validée en simulation avant d'être appliquée au système réel. Les résultats expérimentaux mettent en évidence la nécessité de formater les commandes générées par l'algorithme. Ce formatage a pour objectif d'éviter la génération de commandes irréalisables qui introduisent un biais dans la mise à jour de la fonction de valeur. L'apprentissage converge alors vers la même solution que la commande linéaire quadratique

    The world of Independent learners is not Markovian.

    No full text
    International audienceIn multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violat- ing the property that traditional single-agent learning methods rely upon. This paper formalizes some known intuition about concurrently learning agents by providing formal conditions that make the environment non- Markovian from an independent (non-communicative) learner's perspec- tive. New concepts are introduced like the divergent learning paths and the observability of the e ects of others' actions. To illustrate the formal concepts, a case study is also presented. These ndings are signi cant because they both help to understand failures and successes of existing learning algorithms as well as being suggestive for future work

    A new contactless conveyor system for handling clean and delicate products using induced air flows.

    No full text
    International audienceIn this paper, a new contactless conveyor system based on an original aerodynamic traction principle is described and experimented. This device is able to convey without any contact flat objects like silicon wafer, glass sheets or foodstufff thanks to an air cushion and induced air flows. A model of the system is established and the identification of the parameters is carried out. A closed-loop control is proposed for one dimension position control and position tracking. The PID-controller gives good performances for different reference signals. Its robustness to object change and perturbation rejection are also tested

    A new Aerodynamic traction principle for handling products on an Air Cushion.

    No full text
    International audienceThis paper introduces a new aerodynamic traction principle for handling delicate and clean products, such as silicon wafers, glass sheets or flat foodstuff. The product is carried on a thin air cushion and transported along the system by induced air flows. This induced air flow is the indirect effect of strong vertical air-jets that pull the surrounding fluid. The paper provides a qualitative explanation of the operating principles and a description of the experimental device. Very first experimental results with active control are presented. The maximum velocity and acceleration that can be obtained for the considered device geometry meet the requirements for industrial applications

    2-DOF Contactless Distributed Manipulation Using Superposition of Induced Air Flows.

    No full text
    International audienceMany industries require contactless transport and positioning of delicate or clean objects such as silicon wafers, glass sheets, solar cell or flat foodstuffs. The authors have presented a new form of contactless distributed manipulation using induced air flow. Previous works concerned the evaluation of the maximal velocity of transported objects and one degreeof- freedom position control of objects. This paper introduces an analytic model of the velocity field of the induced air flow according to the spatial configuration of vertical air jets. Then two degrees-of-freedom position control is investigated by exploiting the linearity property of the model. Finally the model is validated under closed-loop control and the performances of the position control are evaluated

    Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.

    No full text
    International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications

    SOaN : un algorithme pour la coordination d'agents apprenants et non communicants.

    No full text
    National audienceL'apprentissage par renforcement dans les systèmes multi-agents est un domaine de recherche très actif, comme en témoignent les états de l'art récents [Busoniu et al., 2008, Sandholm, 2007, Bab & Brafman, 2008, Vlassis, 2007]. Lauer et Riedmiller ont notamment montré que, sous certaines hypothèses, il est possible à des agents apprenants simultanément de coordonner leurs actions sans aucune communication et sans qu'ils perçoivent les actions de leurs congénères [Lauer & Riedmiller, 2000]. Cette propriété est particulièrement intéressante pour trouver des stratégies de coopération dans les systèmes multi-agents de grande taille

    Calibration and Validation of XY Micropositioners with Vision.

    No full text
    International audienceAccuracy is very important criterion for micromanipulation systems, especially for microassembly. In this paper, we propose a full procedure of kinematic calibration and validation for XY micropositioners, which are used as coarse positioning in our microassembly platform. Based on vision, two methods (self-calibration and classical calibration) are presented, implemented, tested and compared. The differential evolution (DE) algorithm is applied to identify the kinematic parameters. After calibrations, we perform tests of accuracy and repeatability through controlling the micropositioners via inverse kinematics

    Coordination of independent learners in cooperative Markov games.

    No full text
    In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents

    Localization, epidemic transitions, and unpredictability of multistrain epidemics with an underlying genotype network

    Full text link
    Mathematical disease modelling has long operated under the assumption that any one infectious disease is caused by one transmissible pathogen spreading among a population. This paradigm has been useful in simplifying the biological reality of epidemics and has allowed the modelling community to focus on the complexity of other factors such as population structure and interventions. However, there is an increasing amount of evidence that the strain diversity of pathogens, and their interplay with the host immune system, can play a large role in shaping the dynamics of epidemics. Here, we introduce a disease model with an underlying genotype network to account for two important mechanisms. One, the disease can mutate along network pathways as it spreads in a host population. Two, the genotype network allows us to define a genetic distance across strains and therefore to model the transcendence of immunity often observed in real world pathogens. We study the emergence of epidemics in this model, through its epidemic phase transitions, and highlight the role of the genotype network in driving cyclicity of diseases, large scale fluctuations, sequential epidemic transitions, as well as localization around specific strains of the associated pathogen. More generally, our model illustrates the richness of behaviours that are possible even in well-mixed host populations once we consider strain diversity and go beyond the "one disease equals one pathogen" paradigm
    • …
    corecore